{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href='http://www.holoviews.org'><img src=\"../../assets/hv+bk.png\" alt=\"HV+BK logos\" width=\"40%;\" align=\"left\"/></a>\n",
    "<div style=\"float:right;\"><h2>Exercise 2: Datasets and Collections of Data</h2></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import holoviews as hv\n",
    "\n",
    "hv.extension('bokeh')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "macro = pd.read_csv('../../data/macro.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start by inspecting the ``macro`` dataframe using the ``head`` method to discover which columns it declares."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now plot the ``growth`` by ``year`` using a ``Scatter`` element."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><a href=\"#solution1\" data-toggle=\"collapse\">Solution</a></b>\n",
    "\n",
    "<div id=\"solution1\" class=\"collapse\">\n",
    "    <br>\n",
    "    <code>hv.Scatter(macro, 'year', 'growth')</code>\n",
    "</div>\n",
    "\n",
    "Now declare the 'country' and 'unem' columns as additional vdims to the ``Scatter`` object and set ``color_index='country'`` and ``size_index='unem'`` as plot options. \n",
    "\n",
    "<b><a href=\"#hint1\" data-toggle=\"collapse\">Hint</a></b>\n",
    "\n",
    "<div id=\"hint1\" class=\"collapse\">\n",
    "The ``kdims`` and ``vdims`` arguments accept a single dimension or lists of dimensions.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><a href=\"#solution2\" data-toggle=\"collapse\">Solution</a></b>\n",
    "\n",
    "<div id=\"solution2\" class=\"collapse\">\n",
    "<br>\n",
    "<code>%%opts Scatter [color_index='country' size_index='unem']\n",
    "hv.Scatter(macro, 'year', ['growth', 'country', 'unem'])</code>\n",
    "</div>\n",
    "\n",
    "You should now have a fairly complex plot showing the growth by year, where each point is colored by the country and a size scaled by the unemployment, but you will immediately note the various issues with this plot. First identify the issues, then try to address them by using tab-completion in the ``%%opts`` magic.\n",
    "\n",
    "<b><a href=\"#hint2\" data-toggle=\"collapse\">Hint</a></b>\n",
    "\n",
    "<div id=\"hint2\" class=\"collapse\">\n",
    "The plot width can be adjusted as in previous examples, the color mapping can be controlled using the ``cmap`` style option, and the legend position (right, left, etc.) can be changed using the ``legend_position`` plot option.  If you don't know what to supply as the value to the ``cmap`` option, try passing an empty string and execute the cell to see a list of possible values.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><a href=\"#solution3\" data-toggle=\"collapse\">Solution</a></b>\n",
    "\n",
    "<div id=\"solution3\" class=\"collapse\">\n",
    "<br>\n",
    "<code>%%opts Scatter [color_index='country' size_index='unem' width=800 height=400 legend_position='left'] (cmap='tab20')\n",
    "hv.Scatter(macro, 'year', ['growth', 'country', 'unem']).sort(['year', 'country'])</code>\n",
    "</div>\n",
    "\n",
    "\n",
    "### Example 2: \n",
    "\n",
    "This time we will be working with a dataset about diamonds:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "diamonds = pd.read_csv('../../data/diamonds.csv').sample(5000)\n",
    "diamond_ds = hv.Dataset(diamonds, ['cut', 'color', 'clarity'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, first inspect the data  to see what columns it has. But this time, instead of looking at the dataframe ``diamonds`` itself, look at the string representation of the ``diamond_ds`` Dataset object.\n",
    "\n",
    "<b><a href=\"#hint3\" data-toggle=\"collapse\">Hint</a></b>\n",
    "\n",
    "<div id=\"hint3\" class=\"collapse\">\n",
    "Printing any HoloViews object will show its string representation, and that's also the default representation for a Dataset since a Dataset is not yet visualizable.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unlike the default dataframe table representation, this representation should clearly separate the key dimensions from the value dimensions.\n",
    "\n",
    "Using the ``.to`` method on ``diamond_ds``, plot the 'price' against the 'carat' column using a  ``Scatter`` element and use the groupby kwarg to split the dataset by 'clarity'.\n",
    "\n",
    "<b><a href=\"#hint4\" data-toggle=\"collapse\">Hint</a></b>\n",
    "\n",
    "<div id=\"hint4\" class=\"collapse\">\n",
    "The ``.to`` method follows a signature of ``dataset.to(Element, kdims, vdims, groupby)``.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><a href=\"#solution4\" data-toggle=\"collapse\">Solution</a></b>\n",
    "\n",
    "<div id=\"solution4\" class=\"collapse\">\n",
    "<br>\n",
    "<code>diamond_ds.to(hv.Scatter, 'carat', 'price', groupby='clarity')</code>\n",
    "</div>\n",
    "\n",
    "The plot you should have gotten lets you view each subset separately by selecting the value of a widget. Now use the ``.overlay`` method on the grouped dataset to overlay the individual plots. Then adjust the width and height of the plot, enable a log y-axis and define a custom color cycle using ``Cycle('Category20')`` as a style option.\n",
    "\n",
    "<b><a href=\"#hint5\" data-toggle=\"collapse\">Hint</a></b>\n",
    "\n",
    "<div id=\"hint5\" class=\"collapse\">\n",
    "A ``Cycle`` can be used for any style option, allowing it to take values that vary in an Overlay. Here, to set a color cycle, just set the ``color`` option to be a ``Cycle``. You can set the width and height on either ``Scatter`` or ``NdOverlay``, but only ``Scatter`` will have a ``color`` option.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><a href=\"#solution5\" data-toggle=\"collapse\">Solution</a></b>\n",
    "\n",
    "<div id=\"solution5\" class=\"collapse\">\n",
    "<br>\n",
    "<code>%%opts Scatter [width=600 height=400 logy=True] (color=Cycle('Category20'))\n",
    "diamond_ds.to(hv.Scatter, 'carat', 'price', groupby='clarity').overlay()</code>\n",
    "</div>\n",
    "\n",
    "### Example 3:\n",
    "\n",
    "Now let's look at the same dataset in a different way. Again using the ``.to`` method plot the 'price' broken down by 'cut' and group it by 'clarity', but use a ``BoxWhisker`` to summarize the distribution, and save it into a variable \"hm\" before displaying it.\n",
    "\n",
    "<b><a href=\"#hint6\" data-toggle=\"collapse\">Hint</a></b>\n",
    "\n",
    "<div id=\"hint6\" class=\"collapse\">\n",
    "Make sure to specify the 'price' as the value dimension, i.e. as the second argument. If you assign a HoloViews object to a variable, it won't be displayed, so just type the name of the variable to see it.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><a href=\"#solution6\" data-toggle=\"collapse\">Solution</a></b>\n",
    "\n",
    "<div id=\"solution6\" class=\"collapse\">\n",
    "<br>\n",
    "<code>hm = diamond_ds.to(hv.BoxWhisker, 'cut', 'price', groupby='clarity')\n",
    "hm</code>\n",
    "</div>\n",
    "\n",
    "This time let's lay out the grouped dimension as a grid. Use the ``grid`` method on the grouped dataset. Then enable the ``shared_xaxis`` and ``shared_yaxis`` plot options on the resulting ``GridSpace``. Finally set a custom ``xrotation`` to rotate the x-axis ticks, being careful to rotate it on the correct element type.\n",
    "\n",
    "<b><a href=\"#hint7\" data-toggle=\"collapse\">Hint</a></b>\n",
    "\n",
    "<div id=\"hint7\" class=\"collapse\">\n",
    "Just like elements, containers like a ``GridSpace`` have plot options which can be specified using ``%%opts GridSpace [...]``. But the ``xrotation`` won't need to be applied to the ``GridSpace``, but to the underlying element in each grid cell.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><a href=\"#solution7\" data-toggle=\"collapse\">Solution</a></b>\n",
    "\n",
    "<div id=\"solution7\" class=\"collapse\">\n",
    "<br>\n",
    "<code>%%opts GridSpace [shared_xaxis=True shared_yaxis=True] BoxWhisker [xrotation=45]\n",
    "diamond_ds.to(hv.BoxWhisker, 'cut', 'price', groupby='clarity').grid()</code>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
